Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds.
نویسندگان
چکیده
The relationship between the aqueous solubility of more than two thousand eight hundred organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset consists of 2537 diverse organic compounds. Multiple Linear Regression (MLR) and Random Forest (RF) methods were used for statistical modeling at the 2D level of representation of molecular structure. Statistical characteristics of the best models are quite good (MLR method: R(2) =0.85, Q(2) =0.83; RF method: R(2) =0.99, R(2) oob =0.88). The external validation set of 301 compounds (including 47 nitro-, nitroso- and nitrogen-rich compounds of military interest) which were not included in the training set and modeling process, was used for evaluation of the models predictivity. Thus, well-fitted and robust (R(2) test (MLR)=0.76 and R(2) test (RF)=0.82) models were obtained for both statistical techniques using descriptors based on the topological structural information only. The predicted solubility values for military compounds are in good agreement with experimental ones. Developed QSPR models represent powerful and easy-to-use virtual screening tool that can be recommended for prediction of aqueous solubility.
منابع مشابه
Prediction of boiling point and water solubility of crude oil hydrocarbons using sub-structural molecular fragments method
The quantitative structure–property relationship (QSPR) method is used to develop the correlation between structures of crude oil hydrocarbons (80 compounds) and their boiling point and water solubility. Sub-structural molecular fragments (SMF) calculated from structure alone were used to represent molecular structures. A subset of the calculated fragments selected using stepwise regression (fo...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملRandom Forest Models To Predict Aqueous Solubility
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offer...
متن کاملNovel enhanced applications of QSPR models: Temperature dependence of aqueous solubility
A model developed to predict aqueous solubility at different temperatures has been proposed based on quantitative structure-property relationships (QSPR) methodology. The prediction consists of two steps. The first one predicts the value of k parameter in the linear equation lgSw=kT+c, where Sw is the value of solubility and T is the value of temperature. The second step uses Random Forest tech...
متن کاملQuantitative Modeling for Prediction of Critical Temperature of Refrigerant Compounds
The quantitative structure-property relationship (QSPR) method is used to develop the correlation between structures of refrigerants (198 compounds) and their critical temperature. Molecular descriptors calculated from structure alone were used to represent molecular structures. A subset of the calculated descriptors selected using a genetic algorithm (GA) was used in the QSPR model development...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Molecular informatics
دوره 29 5 شماره
صفحات -
تاریخ انتشار 2010